Efficient counting of k-mers in DNA sequences using a bloom filter
نویسندگان
چکیده
منابع مشابه
The veracious counting bloom filter
Counting Bloom Filters (CBFs) are widely employed in many applications for fast membership queries. CBF works on dynamic sets rather than a static set via item insertions and deletions. CBF allows false positive, but not false negative. The Bh-Counting Bloom Filter (Bh-CBF) and Variable Increment Counting Bloom Filter (VI-CBF) are introduced to reduce the false positive probability, but they su...
متن کاملClassification of DNA sequences using Bloom filters
MOTIVATION New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. RESULTS A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can a...
متن کاملImproving counting Bloom filter performance with fingerprints
a r t i c l e i n f o a b s t r a c t Bloom filters (BFs) are used in many applications for approximate check of set membership. Counting Bloom filters (CBFs) are an extension of BFs that enable the deletion of entries at the cost of additional storage requirements. Several alternatives to CBFs can be used to reduce the storage overhead. For example schemes based on d-left hashing or Cuckoo has...
متن کاملThese Are Not the K-mers You Are Looking For: Efficient Online K-mer Counting Using a Probabilistic Data Structure
K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix array...
متن کاملA fast, lock-free approach for efficient parallel counting of occurrences of k-mers
MOTIVATION Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: BMC Bioinformatics
سال: 2011
ISSN: 1471-2105
DOI: 10.1186/1471-2105-12-333